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DESCRIPTION 

SPEECH RECOGNITION DICTIONARY CREATION DEVICE AND 
SPEECH RECOGNITION DEVICE 

5 Technical Field 

The present invention relates to a speecin recognition 
dictionary creation device for creating a dictionary used by a speech 
recognition device intended for an unspecified speaker and to a 
speech recognition device and the like for recognizing a speech 
10 using such dictionary. 

Background Art 

Conventionally, a speech recognition dictionary that defines 
recognition vocabulary is indispensable In a speech recognition 

15 device intended for unspecified speakers. A previously created 
speech recognition dictionary is used in the case where words to be 
recognized are definable at the tinne of system planning. However, 
in the case where vocabulary definition is not possible or where 
vocabulary needs to be changed dynamically, speech recognition 

20 vocabulary is generated by means of manual input or automatically 
from character string Information, to be registered into the 
dictionary. For example, a speech recognition device in a television 
program switching device performs morphemic analysis on 
character string Information that includes program information so as 

25 to determine its reading, and registers the obtained reading into the 
speech recognition dictionary. In the case of ""NHK News 10", for 
example, "enu elchi kei nyus ten (NHK News 10)" is registered into 
the speech recognition dictionary as a word representing the 
program. Accordingly, it becomes possible to achieve a function of 

30 switching the channel to ''NHK News 10" in response to a user saying 
'"enu eichi kei nyus ten (NHK News 10)". 

Meanwhile, in consideration that a user will not utter a word in 
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a complete manner, there Is a method for dividing a compound word 
into Its constituent words and registering. Into a dictionary, a 
paraphrase made up of partial character strings that results from 
concatenating constituent words (for example, technology disclosed 
5 in Japanese Laid-Open Patent application No. 2002-41081). 
According to the speech recognition dictionary creation device 
disclosed in this publication, words inputted as character string 
information are analyzed, pairs of speaking unit/reading are then 
prepared by taking into account all of their readings and all 

10 concatenated words, and such pairs are registered into a speech 
recognition dictionary. Accordingly, in the case of the 
above-described television program name ^'NHK News 10", for 
example, the readings '^enu eich kel nyus (NHK News)" and ""nyus 
ten (News 10)" are registered Into the dictionary, thereby allowing 

15 the user's utterance of them to be processed correctly. 

Moreover, according to the above speech recognition 
dictionary creation method, a paraphrase is registered into the 
speech recognition dictionary after being assigned a weight in 
consideration of the following, for example: a likelihood that 

20 indicates the correctness of the reading given to the paraphrase; the 
order in which the words constituting the paraphrase appear; and 
the frequency at which such words are used in the paraphrase. 
Accordingly, It Is expected that words that are more probable as the 
paraphrase can be selected by means of speech comparison. 

25 As described above, the above conventional speech 

recognition dictionary creation method aims at supporting user's 
arbitrary utterances that are given In an abbreviated manner in 
addition to complete utterances of words by analyzing input 
character string Information so as to reconstruct word strings that 

30 are made up of every combination of the analyzed words, and then 
by registering, into the speech recognition dictionary, the readings 
of the word strings as paraphrases of the input word. 
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However, the above conventional speech recognition 
dictionary creation method has problems such as described below. 

Firstly, the number of character strings becomes enormous 
when character strings are generated by every combination of words 
5 in an exhaustive manner. Thus, when all of such character strings 
are registered into the speech recognition dictionary, the size of the 
dictionary becomes huge, which might lead to the decrease in 
recognition rate due to an increased amount of calculation and a 
large number of words that are similar in terms of phonemes, 

10 Furthermore, since it is highly possible that character strings and 
readings that are the same as those of the above paraphrases are 
generated from different words, it is extremely difficult to 
distinguish which word the user is intending to mean, even when a 
character string and reading are correctly recognized. 

15 Furthermore, according to the above conventional speech 

recognition dictionary creation method, a weight of a paraphrase is 
determined by mainly using the likelihoods of words that appear in 
the paraphrase for the purpose of selecting the most likely candidate 
paraphrase from among a large number of candidate paraphrases 

20 registered. However, considering the case where ''Kinyo dorama 
(Friday Drama)" is abbreviated and uttered as ''kin dora", for 
example, no consideration is taken concerning that a factor for 
determining likelihoods used for generating a paraphrase is more 
influenced by the number of phonemes extracted from words that 

25 have been used as constituents of a combination as well as being 
influenced by whether it is natural, as the Japanese language, to 
concatenate phonemes, rather than being influenced by words 
themselves that have been used as constituents of a combination. 
This causes a problem that an appropriate value cannot be given as 

30 a likelihood to each paraphrase. 

Moreover, when a word is specified, there is usually one 
corresponding paraphrase. This is especially notable when a 
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limited user is concerned. However, since the above speech 
recognition dictionary creation method does not exercise any 
controls concerning the generation of paraphrases by taking into 
account the use history of the paraphrases, there is a problem that 
5 the number of paraphrases to be generated and registered into the 
recognition dictionary cannot be appropriately controlled. 

Disclosure of Invention 

In view of the above, it is an object of the present invention to 

10 provide a speech recognition dictionary creation device that 
efficiently creates a speech recognition dictionary that enables even 
an abbreviated paraphrase of a word to be recognized with high 
recognition rate and to provide a high performance speech 
recognition device that uses the speech recognition dictionary 

15 created by such speech recognition dictionary creation device and 
that requires a smaller number of resources. 

In order to achieve the above object, the speech recognition 
dictionary creation device according to the present invention is a 
speech recognition dictionary creation device that creates a speech 

20 recognition dictionary, the device including: an abbreviated word 
generation unit that generates an abbreviated word of a recognition 
object that is made up of one or more constituent words based on a 
rule that takes into account ease of pronunciation; and a vocabulary 
storage unit that holds, as the speech recognition dictionary, the 

25 generated abbreviated word together with the recognition object. 
Accordingly, since an abbreviated word of the recognition object is 
generated based on a rule that takes into account the ease of 
pronunciation and such generated abbreviated word is registered as 
a speech recognition dictionary, it is possible to realize a speech 

30 recognition dictionary creation device that efficiently creates a 
speech recognition dictionary which allows even an abbreviated 
paraphrase of a word to be recognized with high recognition rate. 
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Here, the speech recognition dictionary creation device may 
further include: a word division unit that divides the recognition 
object into the constituent words; and a mora string generation unit 
that generates mora strings of the respective constituent words 
5 based on readings of the respective divided constituent words, 
wherein the abbreviated word generation unit may generate the 
abbreviated word made up of one or more moras by extracting one 
or more moras from the mora strings of the respective constituent 
words and concatenating the extracted moras based on the mora 

10 strings of the respective constituent words generated by the mora 
string generation unit. Here, the abbreviated word generation unit 
may include: an abbreviated word generation rule storage unit that 
holds a generation rule for generating an abbreviated word using 
moras; a candidate generation unit that generates candidate 

15 abbreviated words, each being made up of one or more moras, by 
extracting one or more moras from the mora strings of the 
respective constituent words and concatenating the extracted 
moras; and an abbreviated word determination unit that determines 
an abbreviated word for final generation, by applying the generation 

20 rule held by the abbreviated word generation rule storage unit to the 
generated candidate abbreviated words. 

With the above structure, it becomes possible to generate a 
speech recognition dictionary creation device that ( i ) allows for the 
generation of a highly-likely abbreviated phrase for a new 

25 recognition object by previously constructing a rule for generating 
an abbreviated phrase by extracting partial mora strings from mora 
strings of the constituent words and concatenating the extracted 
partial mora strings, and ( ii ) realizes a speech recognition device 
capable of correctly recognizing an utterance of not only the 

30 recognition object but also an abbreviated phrase of such 
recognition object by registering the generated abbreviated phrase 
into the recognition dictionary as a recognition vocabulary. 
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Furthermore, the abbreviated word generation rule storage 
unit may hold a plurality of generation rules, the abbreviated word 
determination unit may calculate a likelihood under each of the 
generation rules stored in the abbreviated word generation rule 
5 storage unit and determine an utterance probability by 
comprehensively taking into account the calculated likelihoods, the 
utterance probability being determined for each of the generated 
candidate abbreviated words, and the vocabulary storage unit may 
hold the abbreviated word and the utterance probability that are 

10 determined by the abbreviated word determination unit. Here, the 
abbreviated word determination unit may determine the utterance 
probability by summing up values that are obtained by multiplying 
the likelihoods for the respective generation rules by corresponding 
weighting factors, and the abbreviated word determination unit may 

15 determine that a candidate abbreviated word is the abbreviated 
word for final generation in the case where the utterance probability 
of the candidate abbreviated word exceeds a predetermined 
threshold. 

With the above structure, an utterance probability is 
20 calculated for each of one or more abbreviated words generated for 
the recognition object and then stored into the above speech 
recognition dictionary in association with their respective 
abbreviated words. Accordingly, it becomes possible to create a 
speech recognition dictionary that realizes a speech recognition 
25 device capable of performing recognition with high accuracy in 
speech comparison, since a weight that is appropriate for the 
calculated utterance probability is assigned to each abbreviated 
word without having to narrow down only to one of two or more 
abbreviated words generated for one recognition object and a low 
30 probability is assigned to an abbreviated word that is predicted to be 
less likely to be used as an abbreviated word. 

Moreover, the abbreviated word generation rule storage unit 
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may hold a first rule concerning dependency relationship between 
words, and the abbreviated word determination unit may determine, 
based on the first rule, the abbreviated word for final generation 
from among the candidates. For example, the first rule may 
5 include a condition that an abbreviated word should be generated 
using a modifier and a modified word as a pair, or may include a rule 
indicating a relationship between the likelihood and a distance 
between a modifier and a modified word that make up an 
abbreviated word. 

10 The above structure makes it possible to take into account a 

relationship between words that constitute the recognition object at 
the time of generating an abbreviated word of the recognition object 
and thus to generate an abbreviated word that is based on a 
relationship between the constituent words. Accordingly, it 

15 becomes possible to create a speech recognition dictionary that 
realizes a speech recognition device capable of performing 
recognition with high accuracy since it becomes possible to exclude 
a word that is less likely to be included in an abbreviated word from 
among the constituent words included in the recognition object and 

20 to mainly use, in contrast, a word that is highly likely to be included 
in an abbreviated word, as a result of which it becomes possible to 
generate a more appropriate abbreviated word and to prevent an 
abbreviated word that is less likely to be used from being registered 
into the recognition dictionary. 

25 Furthermore, the abbreviated word generation rule storage 

unit may hold a second rule that is related to at least one of a length 
of a partial mora string and a position of the partial mora string, the 
length being a length of the partial mora string that is extracted 
from a mora string of the constituent word when an abbreviated 

30 word is generated, and the position being a position of the partial 
mora string in the constituent word, and the abbreviated word 
determination unit may determine, based on the second rule, the 
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abbreviated word for final generation fronn among the candidates. 
For example, the second rule may Include a rule indicating a 
relationship between the lil<elihood and a number of moras 
indicating the length of the partial mora string, or may include a rule 
5 indicating a relationship between the likelihood and a number of 
moras indicating a distance from a top of the constituent word to the 
partial mora string, the distance indicating the position of the partial 
mora string in the constituent word. 

The above structure makes it possible to take into account the 

10 number of extracted partial mora strings, the position at which each 
mora appear, and the total number of moras included in the 
generated abbreviated word at the time of generating an 
abbreviated word by concatenating partial moras of the words that 
constitute the recognition object. Accordingly, it becomes possible 

15 to regularize a general tendency related to the extraction of 
phonemes at the time of generating an abbreviated word by dividing 
into phonemes a long word or a phrase made up of plural words, 
using mora that is a basic unit of the phonemic rhythm of the 
Japanese language or the like. Thus, it becomes possible to create 

20 a speech recognition dictionary that realizes a speech recognition 
device capable of performing recognition with high accuracy since it 
is possible to generate a more appropriate abbreviated word when 
generating an abbreviated word of a recognition object and to 
prevent an abbreviated word that is less likely to be used from being 

26 registered into a recognition dictionary. 

Moreover, the abbreviated word generation rule storage unit 
may hold a third rule related to concatenated partial mora strings 
that make up an abbreviated word, and the abbreviated word 
determination unit may determine, based on the third rule, the 

30 abbreviated word for final generation from among the candidates. 
For example, the third rule may include a rule indicating a 
relationship between the likelihood and a combination of a last mora 
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and a top mora, the last mora being included in a former of the 
concatenated two partial mora strings and the top mora being 
Included in a latter of the concatenated two partial mora strings. 

The above structure makes it possible to regularize, in the 
5 form of probability of mora concatenation, a general tendency that a 
phoneme sequence that is natural as the Japanese language or the 
like is preferred at the time of generating an abbreviated word from 
a long word or a phrase made up of plural words. Thus, it becomes 
possible to create a speech recognition dictionary that realizes a 

10 speech recognition device capable of performing recognition with 
high accuracy since it is possible to generate a more appropriate 
abbreviated word when generating an abbreviated word from a 
recognition object and to prevent an abbreviated word that is less 
likely to be used from being registered into the recognition 

15 dictionary. 

Furthermore, the speech recognition dictionary creation 
device may further include: an extraction condition storage unit that 
holds a condition for extracting the recognition object from 
character string information that includes the recognition object; a 

20 character string information obtainment unit that obtains the 
character string information that includes the recognition object; 
and a recognition object extraction unit that extracts the recognition 
object from the character string information obtained by the 
character string information obtainment unit according to the 

25 condition held by the extraction condition storage unit, and sends 
the extracted recognition object to the word division unit. 

The above structure makes It possible to extract a recognition 
object in an appropriate manner In accordance with a condition for 
extracting a recognition object from character string information 

30 and to automatically generate an abbreviated word corresponding to 
such recognition object so as to store it into the speech recognition 
dictionary. Moreover, an utterance probability is calculated for 
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each abbreviated word generated, based on a likelihood for a rule 
that has been applied at the time of abbreviated word generation 
and such utterance probability is also stored into the speech 
recognition dictionary. Accordingly, it becomes possible to create a 
5 speech recognition dictionary that realizes a speech recognition 
device capable of performing recognition with high accuracy in 
speech comparison, since an utterance probability is assigned to 
each of one ore more abbreviated words that are automatically 
generated from the character string information. 

10 Furthermore, in order to achieve the above object, the speech 

recognition device according to the present invention is a speech 
recognition device that recognizes an input speech by comparing the 
input speech with a model corresponding to a vocabulary registered 
in a speech recognition dictionary, the device recognizing the speech 

15 using the speech recognition dictionary created by the 
above-described speech recognition dictionary creation device. 

The above structure makes it possible to Include, as a 
comparison target in recognition processing, not only a vocabulary 
in a previously generated speech recognition dictionary but also a 

20 vocabulary in the speech recognition dictionary that stores a 
recognition object extracted from character string information and 
an abbreviated word generated from such recognition object by the 
speech recognition dictionary creation device of the present 
invention. Accordingly, it becomes possible to realize a speech 

25 recognition device that is capable of correctly recognizing not only a 
fixed vocabulary such as a command, but also a vocabulary 
extracted from the character string information, such as a search 
keyword, as well as its abbreviated word, regardless of which one of 
them is uttered. 

30 Here, the speech recognition device according to the present 

Invention Is a speech recognition device that recognizes an input 
speech by comparing the input speech with a model corresponding 
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to a vocabulary registered in a speecli recognition dictionary, tlie 
device including the above-described speech recognition dictionary 
creation device and recognizing the speech using the speech 
recognition dictionary created by the speech recognition dictionary 
5 creation device. 

With the above structure, the extraction of a recognition 
object and the generation of its abbreviated word are automatically 
carried out by inputting the character string information to the 
integrated speech recognition dictionary creation device, and they 

10 are stored into the speech recognition dictionary. Since it is 
possible for the speech recognition device to compare a speech with 
these vocabularies stored in the speech recognition dictionary, it 
becomes possible for the speech recognition device having a 
vocabulary to which addition or change should be variably made to 

15 automatically extract such vocabulary and its abbreviated word 
from the character string information and register them into the 
speech recognition dictionary. 

Here, the abbreviated word and the utterance probability of 
the abbreviated word may be registered into the speech recognition 

20 dictionary together with the recognition object, and the recognition 
unit may recognize the speech by taking into account the utterance 
probability registered in the speech recognition dictionary. The 
speech recognition device may generate a candidate for a 
recognition result of the speech and a likelihood of the candidate, 

25 add a likelihood corresponding to the utterance probability to the 
generated likelihood, and output the candidate as a final recognition 
result based on the resulting addition value. 

With the above structure, an utterance probability of each 
abbreviated word is calculated and stored into the speech 

30 recognition dictionary in the process of extracting a recognition 
object from the character string information and generating its 
abbreviated word. Accordingly, it becomes possible for the speech 



recognition device to perform a comparison by taking Into account 
the utterance probability of eacli abbreviated word at the time of 
speech comparison and to perform a control so that a lower 
probability is assigned to a less-llkely abbreviated word. As a 
5 result, it becomes possible to minimize the reduction In the 
probability of the accuracy of speech recognition due to an excessive 
generation of unnatural abbreviated words. 

Moreover, the speech recognition device may further include: 
an abbreviated word use history storage unit that holds, as use 

10 history information, an abbreviated word recognized for the speech 
and a recognition object corresponding to the abbreviated word; 
and an abbreviated word generation control unit that controls 
generation of an abbreviated word by the abbreviated word 
generation unit based on the use history information held by the 

15 abbreviated word use history storage unit. For example, the 
abbreviated word generation unit of the speech recognition 
dictionary creation device may Include: an abbreviated word 
generation rule storage unit that holds a generation rule for 
generating an abbreviated word using moras; a candidate 

20 generation unit that generates candidate abbreviated words, each 
being made up of one or more moras, by extracting one or more 
moras from the mora strings of the respective constituent words and 
concatenating the extracted moras; and an abbreviated word 
determination unit that determines an abbreviated word for final 

25 generation, by applying the generation rule held by the abbreviated 
word generation rule storage unit to the generated candidate 
abbreviated word, and the abbreviated word generation control unit 
may control the generation of the abbreviated word by making one 
of change, deletion, and addition to the generation rule held by the 

30 abbreviated word generation rule storage unit. 

Similarly, the speech recognition device may further Include: 
an abbreviated word use history storage unit that holds, as use 



- 12- 



history information, an abbreviated word recognized for the speech 
and a recognition object corresponding to the abbreviated word; 
and a dictionary revision unit that revises the abbreviated word 
stored in the speech recognition dictionary based on the use history 
5 information held by the abbreviated word use history storage unit. 
For example, the abbreviated word and the utterance probability of 
the abbreviated word may be registered into the speech recognition 
dictionary together with the recognition object, and the dictionary 
update unit may revise the abbreviated word by changing the 

10 utterance probability of the abbreviated word. 

The above structure makes it possible to control the 
abbreviated word generation rule by taking into account the user's 
tendency regarding the use of abbreviated words, based on the 
history information about the user's use of abbreviated words in the 

15 past. This is a result of focusing on the fact that there is a certain 
tendency for the user's use of abbreviated words and that the 
number of abbreviated words used by the user for the same word is 
two at most. In other words, it becomes possible to generate, 
when newly generating abbreviated words, only those abbreviated 

20 words that are judged to be highly likely to be used from the past 
use of abbreviated words. Furthermore, as for abbreviated words 
that are already stored in the recognition dictionary, if such 
abbreviated words are ones generated from the same word and it 
has become obvious that only one of them is used and the others are 

25 not used, it becomes possible to delete the unused abbreviated 
words from the dictionary. Such function prevents an excessive 
number of abbreviated words from being registered into the 
recognition dictionary as well as minimizing the degradation in the 
performance of speech recognition. Furthermore, also in the case 

30 where a common abbreviated word is included in abbreviated words 
that are generated for different recognition objects, it is possible to 
predict which recognition object the user is intending to mean from 
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information indicating the user's specific use of abbreviated words in 
the past. 

Note that not only is it possible to embody the present 
invention as a speech recognition dictionary creation device and a 
5 speech recognition device as described above, but also as a speech 
recognition dictionary creation method and a speech recognition 
method that include, as their respective steps, the characteristic 
components included in these devices as well as programs that 
cause a computer to execute these steps. It should be also 
10 understood that such programs can be distributed on a recording 
medium such as a CD-ROM and over a communication medium such 
as the Internet. 

Brief Description of Drawings 

15 FIG. 1 is a functional block diagram showing a structure of a 

speech recognition dictionary creation device according to a first 
embodiment of the present invention. 

FIG- 2 is a flowchart showing dictionary creation processing 
performed by the above speech recognition dictionary creation 

20 device. 

FIG. 3 is a flowchart showing a detailed procedure of the 
abbreviated word generation process (S23) shown in FIG. 2. 

FIG. 4 Is a diagram showing a processing table (table that 
holds intermediate data and the like that are temporarily generated) 
25 held by an abbreviated word generation unit of the above speech 
recognition dictionary creation device. 

FIG. 5 is a diagram showing an example of abbreviated word 
generation rules stored in an abbreviated word generation rule 
storage unit of the above speech recognition dictionary creation 
30 device. 

FIG. 6 is a diagram showing an example of the speech 
recognition dictionary stored In a vocabulary storage unit of the 
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above speech recognition dictionary creation device. 

FIG. 7 is a functional block diagram showing a structure of a 
speech recognition device according to a second embodiment of the 
present invention. 
5 FIG. 8 is a flowchart showing a learning function of the above 

speech recognition device. 

FIGS. 9A and 9B are diagrams showing an application 
example of the above speech recognition device. 

FIG. lOA Is a diagram showing example abbreviated words 
10 generated by the speech recognition dictionary creation device 10 
from a recognition object in the Chinese language, and FIG. lOB is a 
diagram showing example abbreviated words generated by the 
speech recognition dictionary creation device 10 from a recognition 
object in the English language. 

15 

Best Mode for Carrying Out the Invention 

The following describes the embodiments of the present 
invention with reference to the drawings. 

20 (First Embodiment) 

FIG. 1 is a functional block diagram showing a structure of a 
speech recognition dictionary creation device 10 according to the 
first embodiment. The present speech recognition dictionary 
creation device 10, which is a device that generates an abbreviated 

25 word from a recognition object and registers it as a dictionary. Is 
comprised of: a recognition object analysis unit 1 and an 
abbreviated word generation unit 7 that are implemented as a 
program, a logical circuit, or the like; and an analysis word 
dictionary storage unit 4, an analysis rule storage unit 5, an 

30 abbreviated word generation rule storage unit 6, and a vocabulary 
storage unit 8 that are implemented as storage devices such as a 
hard disk and a non-volatile memory. 
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The analysis word dictionary storage unit 4 stores. In advance, 
a dictionary related to word units (morphemes) and the definitions 
of their phoneme sequences (phonemic Information) that are used 
for dividing a recognition object Into its constituent words. The 
5 analysis rule storage unit 5 stores. In advance, rules (rules 
concerning syntactic analysis) for dividing a recognition object Into 
word units stored In the analysis word dictionary storage unit 4. 

The abbreviated word generation rule storage unit 6 stores, in 
advance, a plurality of rules concerning the generation of an 

10 abbreviated word of a previously constructed word, i.e., a plurality 
of rules that take into account the ease of pronunciation. For 
example, such rules include: a rule for determining, from among the 
constituent words of the recognition object, a word from which a 
partial mora string should be extracted based on the constituent 

15 words themselves and on their respective dependency relationship; 
a rule for extracting appropriate partial moras based on positions 
from which partial moras are extracted from the constituent words, 
the number of extracted moras, and a total number of moras 
resulted from combining the extracted moras; a rule for 

20 concatenating partial moras based on whether it is natural or not to 
concatenate such extracted moras; and so forth. 

Note that ''mora", which is a phoneme considered as one 
sound (one beat), corresponds approximately to each of hiragana 
characters when a Japanese word Is written in hiragana. 

25 Furthermore, mora corresponds to one sound in haiku when counted 
in a 5-7-5 pattern. Note, however, that as for palatalized 
consonant (sound that Is followed by small ''ya", ''yu" and ''yo")/ 
double consonant (small ''tu'7 choked sound), and syllabic nasal /N/, 
whether they are treated as an independent syllable nor not 

30 depends on whether they are pronounced as one sound (one beat) 
or not. For example, ''Tokyo" consists of four moras "to", "u", "kyo", 
and "u", "Sapporo" consists of four moras "sa", "p", "po", and "ro". 
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and "'Gunma" consists of three moras ''gu", ''n", and ''ma". 

The recognition object analysis unit 1, which is a processing 
unit that performs morphemic analysis, syntax analysis, and mora 
analysis, or the like on the recognition object inputted to the speech 
5 recognition dictionary creation device 10, is comprised of a word 
division unit 2 and a mora string obtainment unit 3. The word 
division unit 2 divides the input recognition object into words that 
constitute such recognition object (constituent words) according to 
information about words stored in the analysis word dictionary 

10 storage unit 4 and the syntax analysis rule stored in the analysis rule 
storage unit 5, and generates a dependency relationship between 
the resulting constituent words (information indicating a 
relationship between a modifier and a modified word). The mora 
string obtainment unit 3 generates a mora string for each of the 

15 constituent words generated by the word division unit 2, based on 
the phonemic information about the words stored in the analysis 
word dictionary storage unit 4. Results of analysis performed by 
the recognition object analysis unit 1, i.e., information generated by 
the word division unit 2 (information about the constituent words of 

20 the recognition object and a dependency relationship among the 
respective words) and information generated by the mora string 
obtainment unit 3 (mora strings indicating phoneme sequences of 
the respective constituent words) are sent to the abbreviated word 
generation unit 7. 

25 The abbreviated word generation unit 7 generates zero or 

more abbreviated words of the recognition object from the 
information about the recognition object sent from the recognition 
object analysis unit 1, using the abbreviated word generation rules 
stored in the abbreviated word generation rule storage unit 6. 

30 More specifically, the abbreviated word generation unit 7 generates 
candidate abbreviated words by combining mora strings of the 
respective words sent from the recognition object analysis unit 1 



- 17- 



based on their dependency relationship, and calculates likelihoods 
of the generated candidate abbreviated words for each of the rules 
stored in the abbreviated word generation rule storage unit 6. 
Then, after assigning a constant weight to the likelihoods and adding 
5 up the resulting likelihoods, the abbreviated word generation unit 7 
calculates an utterance probability of each of the candidates, and 
stores, into the vocabulary storage unit 8, a candidate with an 
utterance probability above a certain level as the abbreviated word 
for final generation, in association with the utterance probability and 

10 the original recognition object. In other words, an abbreviated 
word that is judged by the abbreviated word generation unit 7 as 
having an utterance probability above a certain level, is stored into 
the vocabulary storage unit 8 as a speech recognition dictionary 
together with its utterance probability and information indicating 

15 that such word has the same meaning as that of the input 
recognition object. 

The vocabulary storage unit 8 holds rewritable speech 
recognition dictionaries and performs registration processing. The 
vocabulary storage unit 8 associates the abbreviated word and its 

20 utterance probability generated by the abbreviated word generation 
unit 7 in association with the recognition object inputted to the 
speech recognition dictionary creation device 10, and registers such 
recognition object, abbreviated word, and utterance probability as a 
speech recognition dictionary. 

25 Next, providing concrete examples, a description is given of 

operations performed by the speech recognition dictionary creation 
device 10 with the above structure. 

FIG. 2 is a flowchart showing dictionary creation operations 
performed by the respective units included in the speech recognition 

30 dictionary creation device 10. In the drawing, illustrated on the left 
of the arrows are specific data to be generated such as intermediate 
data, final data, and the like in the case where "'asa no renzoku 
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dorama (Morning drama series)" Is Inputted as a recognition object, 
whereas illustrated on the right are names of data to be referred to 
or to be stored. 

First, in Step S21, the recognition object is read into the word 
5 division unit 2 of the recognition object analysis unit 1. The word 
division unit 2 divides the recognition object into its constituent 
words according to information about the words stored in the 
analysis word dictionary storage unit 4 and the word division rule 
stored in the analysis rule storage unit 5, and determines a 

10 dependency relationship among the respective constituent words. 
In other words, the word division unit 2 performs morphemic 
analysis and syntax analysis. Accordingly, the recognition object 
''asa no renzoku dorama" Is divided, for example, into constituent 
words ^"asa", "^no", 'Yenzoku", and ^^dorama", and 

15 (asa)->((renzoku)->(dorama)) is generated as a dependency 
relationship. In this representation of the dependency relationship, 
a word from which an arrow is extending indicates a modifier, 
whereas a word pointed by an arrow indicates a modified word. 

In Step S22, the mora string obtainment unit 3 assigns, as a 

20 phoneme sequence, a mora string to each of the constituent words 
obtained in the word division processing step S21. In the present 
step, the phonemic Information of the words stored in the analysis 
word dictionary storage unit 4 Is used to obtain the phoneme 
sequences of the respective constituent words. As a result, ''a sa", 

25 ''no", 'Ve n zo ku", and "'do ra ma" are provided as mora strings of the 
constituent words obtained in the word division unit 2, ""asa", "no", 
"Venzoku", and ""dorama". The mora strings that are generated In 
the above manner are sent to the abbreviated word generation unit 
7 together with information about the constituent words and 

30 dependency relationship obtained in Step S21. 

In Step 23, the abbreviated word generation unit 7 generates 
abbreviated words based on the constituent words, dependency 
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relationship, and mora strings sent from the recognition object 
analysis unit 1. When this is done, one or more of the rules stored 
in the abbreviated word generation rule storage unit 6 are applied. 
Such rules include: a rule for determining, from among the 
5 constituent words of the recognition object, a word from which a 
partial mora string should be extracted based on the constituent 
words themselves and their dependency relationship; a rule for 
extracting appropriate partial moras based on positions in the 
respective constituent words from which such partial moras are 

10 extracted, the number of extracted moras, and a total number of 
moras resulted from combining the extracted moras; a rule for 
concatenating partial moras based on whether it is natural or not to 
concatenate such extracted moras; and so forth. The abbreviated 
word generation unit 7 calculates a likelihood based on each of the 

15 rules to be applied when generating abbreviated words, the 
likelihood indicating the degree to which each abbreviated word 
satisfies the applied rule. Then, by summing up the likelihoods for 
the respective rules, the abbreviated word generation unit 7 
calculates an utterance probability of each of the generated 

20 abbreviated words. As a result, ''asadora", 'Yendora", and 
''asarendora" are generated as abbreviated words, to which a higher 
utterance probability is assigned in this order. 

In Step 24, the vocabulary storage unit 8 stores, into the 
speech recognition dictionary, pairs of the abbreviated words and 

25 their utterance probabilities generated by the abbreviated word 
generation unit 7, in association with the recognition object. In this 
manner, the speech recognition dictionary that contains the 
abbreviated words of the recognition object and their utterance 
probabilities is generated. 

30 Next, referring to FIGS. 3 to 5, a description is given of a 

detailed procedure of the abbreviated word generation processing 
(S23) shown in FIG. 2. FIG. 3 is a flowchart showing such detailed 
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procedure, FIG. 4 shows a processing table (table that holds 
intermediate data and the like that are temporarily generated) held 
by the abbreviated word generation unit 7, and FIG. 5 is a diagram 
showing an example of abbreviated word generation rules 6a stored 
5 in the abbreviated word generation rule storage unit 6. 

First, the abbreviated word generation unit 7 generates 
candidate abbreviated words based on the constituent words, 
dependency relationship, and mora strings sent from the recognition 
object analysis unit 1 (S30 in FIG. 3). More specifically, the 

10 abbreviated word generation unit 7 generates candidate 
abbreviated words by combining each of all the modifiers and 
modified words Indicated in the dependency relationship among the 
constituent words sent from the recognition object analysis unit 1. 
When this is done, as illustrated as ""Candidate abbreviated word" in 

15 the processing table of FIG. 4, not only the mora strings of the 
constituent words, but also partial mora strings that are results of 
deleting a part of the respective mora strings, are used as modifiers 
and modified words. For example, in the case of a modifier 
"Venzoku" and a modified word '"dorama", not only 'Venzokudorama", 

20 but also all possible mora strings that are obtained by deleting one 
or mora moras are generated as candidate abbreviated words. 

Next, the abbreviated word generation unit 7 repeats the 
following processes (S30 to S36 in FIG. 3) for each of the generated 
candidate abbreviated words (from S31 in FIG. 3): calculates a 

25 likelihood based on each of the abbreviated word generation rules 
stored in the abbreviated word generation rule storage unit 6 (S32 
to S34 in FIG. 3); and calculates each utterance probability by 
summing up the likelihoods based on a certain weight (S35 in FIG. 
3). 

30 For example, suppose that a rule concerning dependency 

relationship is defined as one of the abbreviated word generation 
rules as shown as Rule 1 in FIG. 5, which defines that a modifier and 
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a modified word should be concatenated in this order and which 
defines a function or the like indicating that a likelihood becomes 
higher as the distance (the number of stages in the dependency 
relationship shown at the top FIG. 4) between a modifier and a 
5 modified word is shorter. In this case, the abbreviated word 
generation unit 7 calculates likelihoods in accordance with such Rule 
1 for each of the candidate abbreviated words. In the case of 
'Vendora", for example, after confirming that it is an abbreviated 
word whose modifier and modified word are concatenated in the 

10 defined order (otherwise, its likelihood is 0), the distance between 
the modifier (ren) and the modified word ''dora" (here, one stage 
since 'Yen(zoku)" modifies "dora(ma)" is determined, and a 
likelihood corresponding to such distance (here, 0.102) is 
determined according to the above function. 

15 Meanwhile, in the case of "^asadora", the distance between the 

modifier ''asa" and the modified word ''dora" is two stages since 
"'asa" modifies ''renzoku dorama", whereas in the case of 
''asarendora", the distance between the modifier and the modified 
word is 1.5 stages, which is the mean value of the two distances, 

20 since ''asarendora" has dependency relationships for both 'Vendora" 
and '^asadora''. 

Furthermore, suppose that a rule concerning partial mora 
string is defined as another example of the abbreviated word 
generation rules as shown as Rule 2 in FIG. 5, which defines rules or 

25 the like concerning the position and length of a partial mora string. 
More specifically, as the rule concerning the position of a partial 
mora string, a rule is defined specifying that a likelihood becomes 
higher as the position of a mora string (partial mora string) 
determined to be used as a modifier or a modified word is located 

30 closer to the top of its original constituent word. In other words, a 
function or the like is defined that indicates a relationship between 
the distance from the top (the number of moras between the top of 
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the original constituent word and tlie top of the partial mora string) 
and a likelihood. In addition, as the rule concerning the length of a 
partial mora string, a rule is defined specifying that a likelihood 
becomes higher as the number of moras making up a partial mora 
5 string is closer to two. In other words, a function that indicates a 
relationship between the length of a partial mora string (the number 
of moras) and a likelihood is defined. The abbreviated word 
generation unit 7 calculates a likelihood of each of the candidate 
abbreviated words in accordance with such Rule 2. In the case of 

10 ''asadora", for example, the position and length of each of the partial 
mora strings ""asa" and ""dora" are determined, and a likelihood of 
each of them is determined in accordance with the above function. 
Then, the mean value of the resulting likelihoods is determined 
(here, 0.128) as a likelihood for Rule 2. 

15 Moreover, suppose that a rule concerning the concatenation 

of morphemes is defined as another example of the abbreviated 
word generation rules as shown as Rule 3 in FIG. 5, which defines a 
rule or the like concerning a concatenated part of partial mora 
strings. Here, as the rule concerning a concatenated part of partial 

20 mora strings, a data table is defined specifying that a likelihood 
becomes low in the case where two partial mora strings are 
concatenated, and the last mora in the fore partial mora string and 
the top mora in the rear partial mora string is unnaturally 
concatenated from the standpoint of phonemic combination 

25 (phonemes that are difficult to pronounce). The abbreviated word 
generation unit 7 calculates a likelihood in accordance with Rule 3 of 
each of the candidate abbreviated words. More specifically, the 
abbreviated word generation unit 7 judges whether or not each 
concatenated part of partial mora strings applies to any of unnatural 

30 concatenations registered in Rule 3. The abbreviated word 
generation unit 7 assigns a likelihood accordingly, when any of them 
applies, whereas it assigns the default likelihood (here, 0.050) 
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otherwise. For example, in ttie case of "asarendora" it is judged 
whether "sare" that is the concatenated part of partial mora strings 
"asa" and "ren" applies to any of unnatural concatenations 
registered in Rule 3. Here, since any of them applies, the default 
5 likelihood (here, 0.050) is assigned. 

As described above, after a likelihood of each of the candidate 
abbreviated words is calculated under the application of each of the 
abbreviated word generation rules, the abbreviated word generation 
unit 7 calculates an utterance probability of each candidate by 

10 summing up each likelihood x that is multiplied by weight (weight a 
shown in FIG, 5 that Is defined on a rule-by-rule basis) according to 
the formula shown In Step 35 in FIG. 3 for determining an utterance 
probability P(w) (S35 in FIG. 3). 

Finally, the abbreviated word generation unit 7 identifies, 

15 from all the candidates, candidate(s) with an utterance probability 
above a predetermined threshold, and outputs them to the 
vocabulary storage unit 8 as the abbreviated words for final 
generation, together with their utterance probabilities (S37 in FIG. 
3). Accordingly, as shown in FIG. 6, the vocabulary storage unit 8 

20 creates a speech recognition dictionary 8a that contains the 
abbreviated words of the recognition object and their utterance 
probabilities. 

The speech recognition dictionary 8a that has been created in 
the above manner contains not only the recognition object, but also 

25 its abbreviated words and their utterance probabilities. Thus, the 
use of the speech recognition dictionary created by the present 
speech recognition dictionary creation device 10 makes it possible 
to provide a speech recognition device that is capable of recognizing 
a speech with high recognition rate regardless of whether a word is 

30 uttered in a formal manner or in an abbreviated manner, by 
detecting that they are utterances of the same intention. For 
example, in the case of "asa no renzoku dorama", regardless of 
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whether the user says '"asanorenzokudorama" or '"asadora", it is 
recognized that such utterance means "asa no renzoku dorama" and 
a speech recognition dictionary with the same functionality is 
created for the speech recognition device. 

5 

(Second Embodiment) 

The second embodiment relates to an example of a speech 
recognition device that is integrated with the speech recognition 
dictionary creation device 10 of the first embodiment, and that uses 

10 the speech recognition dictionary 8a created by such speech 
recognition dictionary creation device 10. The speech recognition 
device related to the present embodiment has a dictionary update 
function of automatically extracting a recognition object from 
character string information and storing It Into the speech 

15 recognition dictionary and a function of preventing less likely 
abbreviated word from being registered into the recognition 
dictionary by controlling the generation of abbreviated words using 
information that is based on the user's history of using abbreviated 
words. Note that the character string information is information 

20 that includes a word to be recognized (recognition object) by the 
speech recognition device. For example, in the case of a speech 
recognition device that automatically switches the channel to a 
television program based on the name of a television program 
uttered by a viewer watching digital television broadcasting, the 

25 name of the television program serves as a recognition object and 
electronic program data broadcast from a broadcast station serves 
as character string information. 

FIG. 7 is a functional block diagram showing a structure of a 
speech recognition device 30 according to the second embodiment. 

30 Such speech recognition device 30 is equipped with a character 
string information capturing unit 17, a recognition object extraction 
condition storage unit 18, a recognition object extraction unit 19, a 
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speech recognition unit 20, a user I/F unit 25, an abbreviated word 
use history storage unit 26, and an abbreviated word generation rule 
control unit 27, in addition to the speech recognition dictionary 
creation device 10 of the first embodinnent. Note that the speech 
5 recognition dictionary creation device 10 is the same as the one 
presented in the first ennbodiment, and therefore a description 
thereof is not repeated here. 

The character string information capturing unit 17, the 
recognition object extraction condition storage unit 18, and the 

10 recognition object extraction unit 19 are Intended for extracting a 
recognition object from the character string information that 
includes such recognition object- According to the present 
structure, the character string information capturing unit 17 
captures the character string information that includes the 

15 recognition object, and the recognition object extraction unit 19 In 
the subsequent stage extracts the recognition object from such 
character string information. In preparation for extracting the 
recognition object from the character string information, morphemic 
analysis Is performed on the character string information, and then 

20 the recognition object is extracted according to a recognition object 
extraction condition stored in the recognition object extraction 
condition storage unit 18. The extracted recognition object is sent 
to the speech recognition dictionary creation device 10, which is 
followed by the generation of its abbreviated words and their 

25 registration into the recognition dictionary. 

Accordingly, it becomes possible for the speech recognition 
device 30 according to the present embodiment to automatically 
extract a search keyword, such as a television program name, from 
character string information such as electronic program data, and 

30 then to create a speech recognition dictionary that makes it possible 
to correctly perform speech recognition regardless of whether the 
keyword or an abbreviated word generated therefrom is uttered. 
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Note that the recognition object extraction condition stored in the 
recognition object extraction condition storage unit 18 is, for 
example, information for identifying electronic program data 
included in digital broadcast data to be inputted to a digital 
5 broadcast receiver and information for identifying the name of a 
television program included in electronic program data. 

The speech recognition unit 20 is a processing unit that 
performs speech recognition of an input speech inputted via a 
microphone or the like based on the speech recognition dictionary 

10 created by the speech recognition dictionary creation device 10. 
Such speech recognition unit 20 is comprised of an acoustic analysis 
unit 21, an acoustic model storage unit 22, a fixed vocabulary 
storage unit 23, and a comparison unit 24. The acoustic analysis 
unit 21 performs frequency analysis or the like on the speech 

15 inputted via the microphone or the like so as to convert it into a 
sequence of feature parameters (e.g., mel-cepsturm coefficient). 
The comparison unit 24 synthesizes models for recognizing the 
respective vocabularies and compares the resultant with the input 
speech, using a model stored in the acoustic model storage unit 22 

20 (e.g., hidden Markov model and Gaussian mixture distributions) 
based on the vocabulary (fixed vocabulary) stored in the fixed 
vocabulary storage unit 23 or the vocabulary (normal words and 
abbreviated words) stored in the vocabulary storage unit 8. As a 
result, words that are given higher likelihoods are sent to the user 

25 I/F unit 25 as candidate recognition results. 

With the above structure, by storing, into the fixed 
vocabulary storage unit 23, vocabulary that can be determined at 
the time of system construction, such as device control command 
(e.g., an utterance "kirikae (switch to another)" to be uttered when 

30 switching a television program to another) and by storing, into the 
vocabulary storage unit 8, vocabulary, such as a television program 
to be switched to, that needs to be variably changed in response to 
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changes in the nanne of a television program, it becomes possible to 
simultaneously recognize both of such vocabularies. 

Furthermore, the vocabulary storage unit 8 stores not only 
abbreviated words but also their utterance probabilities. The 
5 utterance probabilities are used by the comparison unit 24 to 
perform speech comparison. By making it less easy to recognize an 
abbreviated word with low utterance probability, it is possible to 
prevent the decrease in the performance of the speech recognition 
device due to an excessive generation of abbreviated words. For 

10 example, the comparison unit 24 adds the likelihood corresponding 
to an utterance probability (e.g. the logarithmic value of the 
utterance probability) stored in the vocabulary storage unit 8 to a 
likelihood indicating the correlation between the input speech and a 
vocabulary stored in the vocabulary storage unit 8, and determines 

15 the resulting addition value as a final likelihood of the recognition 
result. When such final likelihood exceeds a predetermined 
threshold, the comparison unit 24 sends such vocabulary to the user 
I/F unit 25 as a candidate recognition result. Note that when there 
are a plurality of candidate recognition results whose likelihood 

20 exceeds the predetermined threshold, only those included in 
predetermined ranks in descending order of likelihood are sent to 
the user I/F unit 25. 

Meanwhile, there is a possibility that the speech recognition 
dictionary creation device 10 as above generates abbreviated words 

25 with identical phoneme sequences for a plurality of different 
recognition objects. This problem is caused by the ambiguity of the 
abbreviated word generation rules. It is assumed in ordinary cases 
that the user uses one abbreviated word to mean one corresponding 
recognition object. Thus, a speech recognition device to be 

30 required is capable of presenting an appropriate operation based on 
an uttered abbreviated word by overcoming the ambiguity of the 
abbreviated word generation rules and has a learning function that 
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improves the recognition rate over a long period of usage. Tlie user 
I/F unit 25, the abbreviated word use history storage unit 26, and 
the abbreviated word generation rule control unit 27 are the 
components intended, for such learning function. 
5 In other words, in the case of a failure to narrow down the 

candidate recognition results to one candidate as a result of the 
speech comparison performed by the comparison unit 24, the user 
I/F unit 25 presents such plurality of candidates to the user so as to 
obtain a selection instruction from the user. For example, the user 

10 I/F unit 25 displays, on the television screen, a plurality of 
candidates for recognition result (plural names of television 
programs to be switched to) that have been obtained In response to 
a user's utterance. Accordingly it becomes possible for the user to 
have a desired operation (program switching by speech) by 

15 selecting the correct candidate from among them by use of a remote 
control or the like. 

The abbreviated words that are sent to the user I/F unit 25 or 
the abbreviated word that has been selected by the user from 
among those sent by the user I/F unit 25 in the above manner are 

20 sent to the abbreviated word use history storage unit 26 as history 
information and stored therein. The history information stored in 
the abbreviated word use history storage unit 26 Is evaluated in the 
abbreviated word generation rule control unit 27 and Is used to 
change rules and parameters intended for generating abbreviated 

25 words stored in the abbreviated word generation rule storage unit 6 
as well as to change parameters Intended for calculating utterance 
probabilities of the abbreviated words. At the same time, in the 
case where a one-to-one correspondence is established between an 
original word and its abbreviated word based on a user's usage of 

30 the abbreviated word, such information is stored into the 
abbreviated word generation rule storage unit as well. Such 
information regarding addition/change/deletion of rules stored in 
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the abbreviated word generation rule storage unit 6 is sent also to 
the vocabulary storage unit 8, where the already registered 
abbreviated words are reviewed and the dictionary is updated 
accordingly by deleting or changing abbreviated words. 
5 FIG. 8 is a flowchart showing a learning function of the speech 

recognition device 30 with the above structure. 

In the case where recognition candidate results sent from the 
comparison unit 24 include an abbreviated word stored in the 
vocabulary storage unit 8, the user I/F unit 25 causes the 

10 abbreviated word use history storage unit 26 to accumulate such 
abbreviated word by sending it to the abbreviated word history 
storage unit 26 (S40). When this is done, the abbreviated word 
selected by the user is sent to the abbreviated word use history 
storage unit 26, said abbreviated word being added with information 

15 indicating such fact. 

The abbreviated word generation rule control unit 27 
generates regularity by statistically analyzing the abbreviated 
words stored in the abbreviated word use history storage unit 26 at 
predetermined time intervals or every time a predetermined amount 

20 of information is stored in the abbreviated word use history storage 
unit 26 (S41). For example, the abbreviated word generation rule 
control unit 27 generates a frequency distribution related to the 
length of abbreviated words (the number of moras), a frequency 
distribution related to a sequence of moras constituting abbreviated 

25 words, or the like. When It has been confirmed, based on 
information about user's selection or the like, that the television 
program name ""asa no renzoku dorama" is abbreviated as ^"rendora", 
for example, the abbreviated word generation rule control unit 27 
also generates information indicating a one-to-one correspondence 

30 between the recognition object and the abbreviated word. After 
generating regularity as described above, the abbreviated word 
generation rule control unit 27 deletes the contents stored in the 
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abbreviated word use history storage unit 26 to get ready for future 
accumulation. 

Then, according to the generated regularity, the abbreviated 
word generation rule control unit 27 performs one of addition, 
5 change, and deletion of the abbreviated word generation rules 
stored in the abbreviated word generation rule storage unit 6 (S42). 
For example, based on the frequency distribution concerning the 
length of abbreviated words, the abbreviated word generation rule 
control unit 27 makes an amendment to the rule concerning the 

10 length of partial mora strings (e.g., a parameter for obtaining the 
mean value, out of function parameters indicating the distribution) 
included In Rule 2 shown in FIG. 5. Furthermore, in the case where 
information indicating a one-to-one correspondence between a 
recognition object and an abbreviated word is generated, the 

15 abbreviated word generation rule control unit 27 registers such 
correspondence as a new abbreviated word generation rule. 

As described above, the abbreviated word generation unit 7 
reviews the speech recognition dictionary stored in the vocabulary 
storage unit 8, by repeatedly generating an abbreviated word of the 

20 recognition object according to the abbreviated word generation 
rules on which addition/change/deletion has been performed (S43). 
For example, when having re-calculated the utterance probability of 
the abbreviated word ""asadora" in accordance with such new 
abbreviated word generation rules, the abbreviated word generation 

25 unit 7 updates the utterance probability, whereas when the user 
selects 'Vendora" as an abbreviated word of the recognition object 
"asa no renzoku dorama", the abbreviated word generation unit 7 
raises the utterance probability of the abbreviated word 'Yendora". 
As described above, since the present speech recognition 

30 device 30 is capable of not only performing speech recognition for 
abbreviated words as well, but also updating the abbreviated word 
generation rules in accordance with a recognition result so as to 
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revise the speech recognition dictionary accordingly, it becomes 
possible to achieve a learning function that improves the recognition 
rate over a period of usage. 

FIG. 9A is a diagram showing an application example of the 
5 above-described speech recognition device 30. 

Illustrated in the drawing is a system for automatically 
switching a television program to another in response to a speech. 
Such system is composed of: a set-top box (STB: a digital broadcast 
receiver) 40 that contains the speech recognition device 30; a TV 

10 receiver 41; and a remote control 42 that is capable of functioning 
as a wireless microphone. An utterance of the user is sent to the 
STB 40 via the microphone of the remote control 42 as speech data, 
and is speech-recognized by the speech recognition device 30 
contained in the STB 40. Accordingly, the television program is 

15 switched to another in accordance with the result of such 
recognition. 

For example, suppose the case where a user's utterance is 
"Vendora ni kirikae (switch the channel to the rendora)". In this 
case, such speech is sent to the speech recognition device 30 

20 contained in the STB 40 via the remote control 42. As shown in the 
processing procedure of FIG. 9B, the speech recognition unit 20 of 
the speech recognition device 30 detects that the input speech 
'Vendora ni kirikae" contains a variable vocabulary 'Vendora" (i.e., 
the recognition object '^asa no renzoku doram") and a fixed 

25 vocabulary '"kirikae", with reference to the vocabulary storage unit 8 
and fixed vocabulary storage unit 23. Based on this result, the STB 
40 exercises control for selecting the television program "'asa no 
renzoku dorama" (here, Channel 6) after confirming that the 
electronic program data that has been previously received and 

30 stored as broadcast data includes such television program currently 
on the air. 

As described above, according to the speech recognition 
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device of the present embodiment, not only Is it possible to 
simultaneously recognize a fixed vocabulary such as a command for 
device control and a variable vocabulary such as a television 
program name used for searching for a program, but also to perform 
5 desired processing by associating the control of a device or the like 
with a fixed vocabulary, a variable vocabulary, and further its 
abbreviated word. What is more, it becomes also possible to 
efficiently create a speech recognition dictionary with high 
recognition rate by providing a learning function that takes into 

10 account the user's past use history, thereby overcoming the 
ambiguity related to the process of generating abbreviated words. 

The speech recognition dictionary creation device and speech 
recognition device according to the present Invention have been 
described as above, but the present Invention Is not limited to the 

15 aforementioned embodiments. 

More specifically, the first and second embodiments present 
an example of the speech recognition dictionary creation device 10 
and speech recognition device 30 intended for the Japanese 
language, but it should be understood that the present invention is 

20 applicable not only to the Japanese language, but also to other 
languages such as the Chinese language and the English language. 
FIG. lOA Is a diagram showing example abbreviated words 
generated by the speech recognition dictionary creation device 10 
from a Chinese recognition object, whereas FIG. lOB Is a diagram 

25 showing example abbreviated words generated by the speech 
recognition dictionary creation device 10 from an English 
recognition object. These abbreviated words can be generated 
under abbreviated word generation rules depicted In FIG. 5 as the 
abbreviated word generation rules 6a such as ''the top one syllable 

30 of the recognition object is used as an abbreviated word" and 
'"concatenation of the top one syllables of the respective words 
constituting the recognition object is used as an abbreviated word". 
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Also, the speech recognition dictionary creation device 10 
according to the first embodiment has been described to generate 
abbreviated words with high utterance probability, but 
non-abbreviated normal words may also be generated. For 
5 example, the abbreviated word generation unit 7 may not only 
register, into the speech recognition dictionary of the vocabulary 
storage unit 8, abbreviated words, but also may register a mora 
string corresponding to a non-abbreviated recognition object as a 
fixed mora string, together with a predetermined utterance 

10 probability- Alternatively, it is also possible to simultaneously 
recognize a normal word spelled in full and an abbreviated word by 
causing the speech recognition device to include, as a recognition 
object, not only abbreviated words registered in its speech 
recognition dictionary, but also recognition object serving as 

15 indexes of the speech recognition dictionary. 

Furthermore, the abbreviated word generation rule control 
unit 27 according to the first embodiment has been described to 
make a change to the abbreviated word generation rules stored in 
the abbreviated word generation rule storage unit 6, but it may 

20 directly make a change to the contents of the vocabulary storage 
unit 8. More specifically, addition, change, or deletion may be 
performed on abbreviated words registered in the speech 
recognition dictionary 8a stored in the vocabulary storage unit 8a 
and increase/decrease in the utterance probabilities of the 

25 registered abbreviated words may be performed. Accordingly, the 
speech recognition dictionary is directly revised based on the use 
history information stored in the abbreviated word use history 
storage unit 26. 

Furthermore, the abbreviated word generation rules stored in 

30 the abbreviated word generation rule storage unit 6 as well as the 
definitions of words used in the rules are not limited to those used in 
the present embodiment. For example, in the present embodiment. 
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although the distance between a modifier and a modified word 
Indicates a stage in a dependency relationship diagram, the present 
invention is not limited to such definition, and thus ""the distance 
between a modifier and a modified word" may be defined as a value 
5 that indicates whether a connection of a modifier and a modified 
word is appropriate or not from a semantic viewpoint. For example, 
in the case of ''(burning red (evening sun))" and ''(bright blue 
(evening sun))", since the former is natural from a semantic 
viewpoint, a standard may be adopted by which it is indicated that 
10 the distance Is closer In the former case. 

Furthermore, the second embodiment presents, as an 
application example of the speech recognition device 30, automatic 
program switching performed in a digital broadcast receiving system, 

15 but such automatic program switching is not limited to a one-way 
communication system such as a broadcast system, and thus the 
present invention is also applicable to a two-way communication 
system such as the Internet and telephone network. For example, 
by integrating the speech recognition device of the present 

20 invention into a mobile telephone, it becomes possible to realize a 
content distribution system in which a user's specification of a 
desired content is speech-recognized, and such content is 
downloaded from a website on the Internet. For example, when the 
user says "Kuma P wo download (download kuma P)", a variable 

25 vocabulary "kuma P (an abbreviated word of "Kuma no P-san (Bear 
named pi))" and a fixed vocabulary "download" are recognized, and 
a mobile phone ringing melody "Kuma no P-san" is downloaded to 
the mobile phone from a website on the Internet. 

Similarly, the speech recognition device 30 of the present 

30 invention is not limited to a communication system such as a 
broadcast system and a content distribution system, and thus is also 
applicable to a stand-alone device. For example, by integrating the 
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speech recognition device 30 of the present invention into a car 
navigation device, it is possible to realize a convenient, 
highly-secured car navigation device that is capable of recognizing a 
place name or the lil<e of a destination uttered by a driver and 
5 automatically displaying a map to such destination. For example, 
when a driver says, ''kadokado wo hyouji (Display kadokado)", a 
variable vocabulary ''kadokado (an abbreviated word of ''Oaza 
Kadoma, Kadoma-Shi, Osaka")" and a fixed vocabulary ''hyoji 
(display)" are recognized, and a map of the neighborhood of ''Oaza 

10 Kadoma, Kadoma-Shi, Osaka" is automatically displayed on the 
screen of the car navigation. 

As described above, the present invention makes it possible 
to create a speech recognition dictionary intended for speech 
recognition device that operates in the same manner in both cases 

15 where a recognition object is uttered in a formal manner and where 
it is uttered in an abbreviated manner. Furthermore, since 
abbreviated word generation rules focusing on moras being the 
rhythm of producing a speech in the Japanese language are applied 
and weights are assigned to abbreviated words in consideration of 

20 their respective utterance probabilities, it becomes possible to 
prevent abbreviated words from being unnecessarily generated and 
registered into the recognition dictionary and to prevent generated 
abbreviated words from inversely affecting the performance of the 
speech recognition device through a combined use of weighting. 

25 Moreover, the speech recognition device integrated with the 

above-described speech recognition dictionary creation device is 
capable of constructing a speech recognition dictionary in an 
efficient manner since it is possible to resolve the problem caused by 
a many-to-may relationship between original word and abbreviated 

30 word that is attributable to the ambiguity of the abbreviated word 
generation rules, by the speech recognition dictionary creation unit 
utilizing the user's history about the use of abbreviated words. 
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Furthermore, since the speech recognition device of the 
present invention establishes a feedbacl< system for reflecting a 
recognition result to the process of creating a speech recognition 
dictionary, it is possible to achieve a learning effect that the 
5 recognition rate becomes higher over a period of using the device. 

As described above, since the present invention is capable of 
recognizing a speech that includes an abbreviated word with high 
recognition rate, it becomes possible through a speech that Includes 
an abbreviated word to switch a television program to another, 

10 operate a mobile phone, and provide an instruction or the like to a 
car navigation device. Thus, the present Invention is capable of 
offering a highly significant practical value- 
Industrial Applicability 

15 It Is possible to use the present Invention as a speech 

recognition dictionary creation device for creating a dictionary used 
for a speech recognition device intended for an unspecified speaker 
and as a speech recognition device and the like for performing 
speech recognition using such dictionary. The present invention, in 

20 particular, is applicable to a speech recognition device or the like for 
recognizing a vocabulary that includes an abbreviated word, an 
example of which Is a digital broadcast receiver and a car navigation 
device. 
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